-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic multi-partition GroupBy
support to cuDF-Polars
#17503
Add basic multi-partition GroupBy
support to cuDF-Polars
#17503
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
# Check that we are grouping on element-wise | ||
# keys (is this already guaranteed?) | ||
for ne in ir.keys: | ||
if not isinstance(ne.value, Col): # pragma: no cover | ||
return _single_fallback(ir, children, partition_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by elementwise keys? It's certainly not the case that we always group on columns. But I think it is the case that the group keys (if expressions) are trivially elementwise (e.g. a + b
as a key is fine, but a.unique()
or a.sort()
is not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I'm being extra cautious by requiring the keys to be Col
. This comment is essentially asking: "can we drop this check altogether? ie. Will the keys always be element-wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened pola-rs/polars#20152 as well
This PR is pretty-much "ready" - I don't think it makes sense to build more groupby logic directly on top of this. It would be much better to revise the underlying |
@wence- I'm marking this as "ready for review". As we discussed offline, this will need to be revised for general support. However, the current design should be sufficient for basic TPCH support/testing. |
.items() | ||
if c in groupby_key_columns | ||
} | ||
if cardinality_factor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "cardinality factor" logic is new, and can be pulled out of the PR if necessary. However, we do need a mechanism to trigger shuffle-based groupby aggregations in practice.
@wence- Any action items left for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Rick, I think this is looking pretty good. I have some suggestions for some cleanup, and one worry about the schema
s we're using.
ir.schema, | ||
ir.keys, | ||
piecewise_exprs, | ||
ir.maintain_order, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can support maintain_order == True
(at least easily). So perhaps we should raise in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we support it for the tree reduction, but not for a shuffle-based reduction, right? The tree-reduction tasks should be ordered appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true.
Thanks for the review @wence- ! I tried addressing your comments - but I suppose I had to rewrite most of the code in the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nits, thanks for the iteration here Rick!
ir.schema, | ||
ir.keys, | ||
piecewise_exprs, | ||
ir.maintain_order, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true.
/merge |
Description
Adds multi-partition support for simple
GroupBy
aggregations (following the same design as #17441)Checklist